cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Juniper_AIML
by New Contributor
  • 2853 Views
  • 1 replies
  • 0 kudos

How to setup Instance profile for initializing Databricks Cluster using Docker?

I was trying to start of the Databricks cluster through a docker image. I followed the setup instruction. Excluding the additional setup to setup the IAM role and instance profile as I was facing issues.The image is stored on AWS ECR in a public repo...

Screenshot 2022-02-15 at 2.39.57 PM Screenshot 2022-02-15 at 2.49.03 PM
  • 2853 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hello again, @Aman Gaurav​. Thanks for your second question. As before, we'll see what the community has to say first.

  • 0 kudos
fff_ds
by New Contributor
  • 1271 Views
  • 1 replies
  • 1 kudos

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...

  • 1271 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.

  • 1 kudos
Mihai1
by New Contributor III
  • 2612 Views
  • 3 replies
  • 4 kudos

Resolved! MLflow Model Serving on Azure Databricks General Availability

When is MLflow Model Serving on Azure Databricks expected to become General Available?

  • 2612 Views
  • 3 replies
  • 4 kudos
Latest Reply
User16764241763
Honored Contributor
  • 4 kudos

Hello Mihai,We plan to GA, Model serving by end of this year as we are working on a lot of improvements.

  • 4 kudos
2 More Replies
omran
by New Contributor II
  • 3167 Views
  • 2 replies
  • 2 kudos

Resolved! What is the procedure of using IPOPT solver in Azure databricks?

I have installed ipopt solver of version 3.11.1 in azure databricks but while running the code it throwing an error - WARNING: Could not locate the 'ipopt' executable, which is required for solver ipoptApplicationError: No executable found for solv...

  • 3167 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16764241763
Honored Contributor
  • 2 kudos

Hello @omran shaik​  In the past, we have recommended customers to use docker containers with Databricks as some of these solvers required native compilation and did not work great on the runtimes.With DCS you have full control of what you want to in...

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 833 Views
  • 0 replies
  • 1 kudos

The Next Databricks Office HoursOur next Office Hours session is scheduled for February 23, 2022 - 8:00 am PDT Do you have questions about how to set ...

The Next Databricks Office HoursOur next Office Hours session is scheduled for February 23, 2022 - 8:00 am PDTDo you have questions about how to set up or use Databricks? Do you want to get best practices for deploying your use case or tips on data a...

  • 833 Views
  • 0 replies
  • 1 kudos
ninjadev999
by New Contributor II
  • 6978 Views
  • 7 replies
  • 1 kudos

Resolved! Can't write big DataFrame into MSSQL server by using jdbc driver on Azure Databricks

I'm reading a huge csv file including 39,795,158 records and writing into MSSQL server, on Azure Databricks. The Databricks(notebook) is running on a cluster node with 56 GB Memory, 16 Cores, and 12 workers.This is my code in Python and PySpark:from ...

  • 6978 Views
  • 7 replies
  • 1 kudos
Latest Reply
User16764241763
Honored Contributor
  • 1 kudos

Hi,If you are using Azure SQL DB Managed instance, could you please file a support request with Azure team? This is to review any timeouts, perf issues on the backend.Also, it seems like the timeout is coming from SQL Server which is closing the conn...

  • 1 kudos
6 More Replies
my_community2
by New Contributor III
  • 3071 Views
  • 1 replies
  • 2 kudos

Resolved! SQL cast operator not working properly

please have a look at the attached screenshotThree strings converted to float, each resulting in the same number. 22015683.000000000000000000 => 2201568422015684.000000000000000000 => 2201568422015685.000000000000000000 => 22015684

sql_cast
  • 3071 Views
  • 1 replies
  • 2 kudos
Latest Reply
MartinB
Contributor III
  • 2 kudos

Hi @Maciej G​ ,I guess, this has something to do with the data type FLOAT and its precision.Floats are only an approximation with a given precision. Either you should consider using date type DOUBLE (double precision compared to FLOAT) - or, if you ...

  • 2 kudos
jimnaik
by New Contributor III
  • 20279 Views
  • 2 replies
  • 1 kudos

Resolved! How to execute .sh and .py file in the workspace?

I want to execute shell script which is running .py file. May I know how to run .sh file and .py files in Databricks workspace?

  • 20279 Views
  • 2 replies
  • 1 kudos
Latest Reply
jimnaik
New Contributor III
  • 1 kudos

I tried executing like this and it worked: %sh /dbfs/***/***/***.sh

  • 1 kudos
1 More Replies
IvNen
by New Contributor II
  • 2846 Views
  • 1 replies
  • 1 kudos

Azure functions error communicating with a databricks notebook

I have a connection between azure functions and databricks notebook to pull data from the notebook. That was working fine until 7th of Feb, but then I started getting an error without a sensible error code. I have attached the stack trace and the err...

Capture
  • 2846 Views
  • 1 replies
  • 1 kudos
Latest Reply
IvNen
New Contributor II
  • 1 kudos

Notebook in a databricks cluster keeps a cache while the cluster is running. If you add an import statement and then remove it, the notebook still has a cached instance of that import and will continue to work. Running code in Visual Studio against t...

  • 1 kudos
Braxx
by Contributor II
  • 2950 Views
  • 2 replies
  • 3 kudos

Resolved! issue with rounding selected column in "for in" loop

This must be trivial, but I must have missed something.I have a dataframe (test1) and want to round all the columns listed in list of columns (col_list)here is the code I am running:col_list = ['measure1', 'measure2', 'measure3']   for i in col_list:...

image image
  • 2950 Views
  • 2 replies
  • 3 kudos
Latest Reply
Braxx
Contributor II
  • 3 kudos

You're absolutely right. thanks

  • 3 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 5118 Views
  • 2 replies
  • 3 kudos

Resolved! Running vacuum on each table

Hi, in line with my question about optimize, this is the next step, with a retention of 7 days I could execute vacuum on all tables once a week, is this a recommended procedure?How can I know if I'll be getting any benefit from vacuum, without DRY RU...

  • 5118 Views
  • 2 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

Ideally 7 days is recommended, but discuss with data stakeholders to identify what's suitable? 7/14/28 days. To use VACCUM, first run some analytics on behaviour of your data.Identify % of operations that perform updates and deletes vs insert operati...

  • 3 kudos
1 More Replies
ShriS1221
by New Contributor II
  • 7389 Views
  • 2 replies
  • 0 kudos

Removing new line character from spark dataframe column

I have to remove new line character from entire column of a dataframe , I tried with regex_replace but its not working.Help me on this.​

  • 7389 Views
  • 2 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please provide example of your data and the code you've tried?

  • 0 kudos
1 More Replies
NOOR_BASHASHAIK
by Contributor
  • 1694 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks PAT (personal access token) with access to databases selectively

Hi all,I am establishing a connection to databricks from Collibra through Spark driver. Collibra expects these details for the connection (for token based):personal access token (pat)server/workspace namehttpPathUpon successful connection, Collibra d...

  • 1694 Views
  • 1 replies
  • 0 kudos
Latest Reply
Atanu
Databricks Employee
  • 0 kudos

PAT token is integrated with the workspace, So it will get access of all hive. Is there anyway you can filter out with Collibra?

  • 0 kudos
jeffreym9
by New Contributor III
  • 3829 Views
  • 4 replies
  • 0 kudos

Resolved! Hive version after Upgrade Azure Databricks from 6.4 (Spark 2) to 9.1 (Spark 3)

I have upgraded the Azure Databricks from 6.4 to 9.1 which enable me to use Spark3. As far as I know, the Hive version has to be upgraded to 2.3.7 as well as discussed in: https://community.databricks.com/s/question/0D53f00001HKHy2CAH/how-to-upgrade-...

  • 3829 Views
  • 4 replies
  • 0 kudos
Latest Reply
jeffreym9
New Contributor III
  • 0 kudos

I'm asking about Datatricks version 9.1. I've follow the url given (https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore). Do you mind letting me know where in the table is mentioning the supported hive version fo...

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels