Data Engineering

Forum Posts

Sorted by:

by User16826992666 • Valued Contributor

06-16-2021 10:51:24 AM

1237 Views
3 replies
0 kudos

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

Our team would like to use the Repos functionality but our security prevents outside traffic through public networks. Is there any way we can still use Repos?

Data Engineering

1237 Views
3 replies
0 kudos

06-16-2021 10:51:24 AM

View Replies

Latest Reply

User16781336501
New Contributor III

12-02-2021 10:38:45 AM

0 kudos

Please contact your account team for some options that are in preview right now.

0 kudos

12-02-2021 10:38:45 AM

2 More Replies

by Siddhesh2525 • New Contributor III

12-01-2021 3:16:56 AM

4481 Views
2 replies
6 kudos

How to pass dynamic value in databricks

I have separate column value defined in 13 diffrent notebook and i want merge into 1 databrick notebook and want to pass dynamic parameter using databrick so it will help me to run in single databricks notebook .

Data Engineering

4481 Views
2 replies
6 kudos

12-01-2021 3:16:56 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

12-02-2021 7:06:37 AM

6 kudos

Hi @siddhesh Bhavar you can use widgets with the %run command to achieve this. https://docs.databricks.com/notebooks/widgets.html#use-widgets-with-run%run /path/to/notebook $X="10" $Y="1"

6 kudos

12-02-2021 7:06:37 AM

1 More Replies

by William_Scardua • Valued Contributor

12-01-2021 4:13:39 PM

4055 Views
5 replies
12 kudos

The database and tables disappears when I delete the cluster

Hi guys,I have a trial databricks account, I realized that when I shutdown the cluster my databases and tables is disappear .. that is correct or thats is because my account is trial ?

Data Engineering

4055 Views
5 replies
12 kudos

12-01-2021 4:13:39 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

12-02-2021 5:04:04 AM

12 kudos

@William Scardua if it's an external hive metastore or Glue catalog you might be missing the configuration on the cluster. https://docs.databricks.com/data/metastores/index.htmlAlso as mentioned by @Hubert Dudek , if it's a community edition then t...

12 kudos

12-02-2021 5:04:04 AM

4 More Replies

by William_Scardua • Valued Contributor

11-29-2021 7:15:15 AM

5301 Views
6 replies
3 kudos

Resolved! How do you create a Sandbox in your data environment ?

Hi guys,How do you create a Sandbox in your data environment ? have any idea ?Azzure/AWS + Data Lake + Databricks

Data Engineering

5301 Views
6 replies
3 kudos

11-29-2021 7:15:15 AM

View Replies

Latest Reply

missyT
New Contributor III

11-29-2021 8:25:40 PM

3 kudos

In a sandbox environment, you will find the Designer enabled. You can activate Designer by selecting the design icon Designer. on a page, or by choosing the Design menu item in the Settings Settings menu.

3 kudos

11-29-2021 8:25:40 PM

5 More Replies

by Chris_Shehu • Valued Contributor III

12-01-2021 10:38:11 AM

4388 Views
2 replies
10 kudos

Resolved! When trying to use pyodbc connector to write files to SQL server receiving error. java.lang.ClassNotFoundException Any alternatives or ways to fix this?

jdbcUsername = ******** jdbcPassword = *************** server_name = "jdbc:sqlserver://***********:******" database_name = "********" url = server_name + ";" + "databaseName=" + database_name + ";" table_name = "PatientTEST" try: df.write \ ...

Data Engineering

4388 Views
2 replies
10 kudos

12-01-2021 10:38:11 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-01-2021 11:15:20 AM

10 kudos

please check following code:df.write.jdbc( url="jdbc:sqlserver://<host>:1433;database=<db>;user=<user>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;driver=com.microsof...

10 kudos

12-01-2021 11:15:20 AM

1 More Replies

by Ericsson • New Contributor II

12-01-2021 8:45:17 AM

1560 Views
2 replies
1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

Data Engineering

1560 Views
2 replies
1 kudos

12-01-2021 8:45:17 AM

View Replies

Latest Reply

Lauri
New Contributor III

12-01-2021 10:33:59 AM

1 kudos

You can use lpad() to achieve the 'ww' format.

1 kudos

12-01-2021 10:33:59 AM

1 More Replies

by Braxx • Contributor II

11-23-2021 6:50:41 AM

6424 Views
12 replies
2 kudos

Resolved! Validate a schema of json in column

I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...

Data Engineering

6424 Views
12 replies
2 kudos

11-23-2021 6:50:41 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-01-2021 8:41:53 AM

2 kudos

@Bartosz Wachocki - Thank you for sharing your solution and marking it as best.

2 kudos

12-01-2021 8:41:53 AM

11 More Replies

by pjp94 • Contributor

11-30-2021 4:15:13 PM

3251 Views
13 replies
5 kudos

Pyspark vs Pandas

Would like to better understand the advantage of writing a python notebook in pyspark vs pandas. Does the entire notebook need to be written in pyspark to realize the performance benefits. I currently have a script using pandas for all my transformat...

Data Engineering

3251 Views
13 replies
5 kudos

11-30-2021 4:15:13 PM

View Replies

Latest Reply

cconnell
Contributor II

12-01-2021 8:05:38 AM

5 kudos

You can use the free Community Edition of Databricks that includes 10.0 runtime.

5 kudos

12-01-2021 8:05:38 AM

12 More Replies

by mangeldfz • New Contributor III

11-17-2021 1:13:35 AM

5183 Views
8 replies
8 kudos

Resolved! mlflow RESOURCE_ALREADY_EXISTS

I tried to log some run in my Databricks Workspace and I'm facing the following error: RESOURCE_ALREADY_EXISTS when I try to log any run.I could replicate the error with the following code:import mlflow import mlflow.sklearn from mlflow.tracking impo...

Data Engineering

5183 Views
8 replies
8 kudos

11-17-2021 1:13:35 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

11-17-2021 8:10:24 AM

8 kudos

Hi @Miguel Ángel Fernández it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML. https://docs.microsoft.com/en-us/azure/machine-l...

8 kudos

11-17-2021 8:10:24 AM

7 More Replies

by sarvesh • Contributor III

12-01-2021 5:11:00 AM

2735 Views
4 replies
3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

Data Engineering

2735 Views
4 replies
3 kudos

12-01-2021 5:11:00 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-01-2021 5:42:43 AM

3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

3 kudos

12-01-2021 5:42:43 AM

3 More Replies

by sarvesh • Contributor III

11-22-2021 9:51:42 PM

19802 Views
18 replies
6 kudos

Resolved! java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

solution :- i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000). Before i could n't even read a 9mb file now i just read a 50mb file without any error.{ val df = spark.read .f...

Data Engineering

19802 Views
18 replies
6 kudos

11-22-2021 9:51:42 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-23-2021 3:07:12 AM

6 kudos

can you try without: .set("spark.driver.memory","4g") .set("spark.executor.memory", "6g")It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).You can not also allocate 100% for ...

6 kudos

11-23-2021 3:07:12 AM

17 More Replies

by SailajaB • Valued Contributor III

11-30-2021 11:45:35 PM

9953 Views
9 replies
6 kudos

How to send a list as parameter in databricks notebook task

Hi,How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.Thank you

Data Engineering

9953 Views
9 replies
6 kudos

11-30-2021 11:45:35 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-01-2021 4:02:23 AM

6 kudos

another another way (in databricks you can achieve everything many ways) is to encode list using json library:import json print type(json.dumps([1, 2, 3])) #>> <type 'str'>

6 kudos

12-01-2021 4:02:23 AM

8 More Replies

by WillJMSFT • New Contributor III

11-29-2021 9:02:03 PM

1920 Views
6 replies
7 kudos

Resolved! How to import SqlDWRelation from com.databricks.spark.sqldw

Hello, All - I'm working on a project using the SQL DataWarehouse connector built into Databricks (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html). From there, I'm trying to extract information from the logical plan / logi...

Data Engineering

1920 Views
6 replies
7 kudos

11-29-2021 9:02:03 PM

View Replies

Latest Reply

WillJMSFT
New Contributor III

11-30-2021 5:57:19 AM

7 kudos

@Werner Stinckens Thanks for the reply! The SQL DW Connector itself is working just fine and I can retrieve the results from the SQL DW. I'm trying to extract the metadata (i.e. the Server, Database, and Table name) from the logical plan (or throu...

7 kudos

11-30-2021 5:57:19 AM

5 More Replies

by Dileep_Vidyadar • New Contributor III

11-23-2021 9:58:14 AM

1946 Views
7 replies
5 kudos

Not Able to create Cluster on Community Edition for 3-4 days.

I am learning Pyspark on Community edition for a like month. It's been great until I am facing issues while creating a cluster for 3-4 Days.Sometimes it is taking 30 minutes to 60 minutes to create a Cluster and sometimes it is not even creating a Cl...

Data Engineering

1946 Views
7 replies
5 kudos

11-23-2021 9:58:14 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-30-2021 10:20:53 AM

5 kudos

@Dileep Vidyadara - If your question was fully answered by @Hubert Dudek, would you be happy to mark his answer as best?

5 kudos

11-30-2021 10:20:53 AM

6 More Replies

by All_Users • New Contributor II

11-30-2021 8:25:30 AM

865 Views
0 replies
1 kudos

How do you upload a folder of csv files from your local machine into the Databricks platform?

I am working with time-series data, where each day is a separate csv file. I have tried to load a zip file to FileStore but then cannot use the magic command to unzip, most likely because it is in the tmp folder. Is there a workaround for this proble...

Data Engineering

865 Views
0 replies
1 kudos

11-30-2021 8:25:30 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

How to pass dynamic value in databricks

The database and tables disappears when I delete the cluster

Resolved! How do you create a Sandbox in your data environment ?

Resolved! When trying to use pyodbc connector to write files to SQL server receiving error. java.lang.ClassNotFoundException Any alternatives or ways to fix this?

SQL week format issue its not showing result as 01(ww)

Resolved! Validate a schema of json in column

Pyspark vs Pandas

Resolved! mlflow RESOURCE_ALREADY_EXISTS

read percentage values in spark ( no casting )

Resolved! java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

How to send a list as parameter in databricks notebook task

Resolved! How to import SqlDWRelation from com.databricks.spark.sqldw

Not Able to create Cluster on Community Edition for 3-4 days.

How do you upload a folder of csv files from your local machine into the Databricks platform?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...