cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Chris_Shehu
by Valued Contributor III
  • 5260 Views
  • 2 replies
  • 10 kudos

Resolved! When trying to use pyodbc connector to write files to SQL server receiving error. java.lang.ClassNotFoundException Any alternatives or ways to fix this?

jdbcUsername = ******** jdbcPassword = *************** server_name = "jdbc:sqlserver://***********:******" database_name = "********" url = server_name + ";" + "databaseName=" + database_name + ";"   table_name = "PatientTEST"   try: df.write \ ...

  • 5260 Views
  • 2 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

please check following code:df.write.jdbc( url="jdbc:sqlserver://<host>:1433;database=<db>;user=<user>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;driver=com.microsof...

  • 10 kudos
1 More Replies
Ericsson
by New Contributor II
  • 2373 Views
  • 2 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 2373 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lauri
New Contributor III
  • 1 kudos

You can use lpad() to achieve the 'ww' format.

  • 1 kudos
1 More Replies
Braxx
by Contributor II
  • 10436 Views
  • 12 replies
  • 2 kudos

Resolved! Validate a schema of json in column

I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...

df
  • 10436 Views
  • 12 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Bartosz Wachocki​ - Thank you for sharing your solution and marking it as best.

  • 2 kudos
11 More Replies
pjp94
by Contributor
  • 5721 Views
  • 13 replies
  • 5 kudos

Pyspark vs Pandas

Would like to better understand the advantage of writing a python notebook in pyspark vs pandas. Does the entire notebook need to be written in pyspark to realize the performance benefits. I currently have a script using pandas for all my transformat...

  • 5721 Views
  • 13 replies
  • 5 kudos
Latest Reply
cconnell
Contributor II
  • 5 kudos

You can use the free Community Edition of Databricks that includes 10.0 runtime.

  • 5 kudos
12 More Replies
mangeldfz
by New Contributor III
  • 8186 Views
  • 8 replies
  • 8 kudos

Resolved! mlflow RESOURCE_ALREADY_EXISTS

I tried to log some run in my Databricks Workspace and I'm facing the following error: RESOURCE_ALREADY_EXISTS when I try to log any run.I could replicate the error with the following code:import mlflow import mlflow.sklearn from mlflow.tracking impo...

image.png
  • 8186 Views
  • 8 replies
  • 8 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 8 kudos

Hi @Miguel Ángel Fernández​  it’s not recommended to “link” the Databricks and AML workspaces, as we are seeing more problems. You can refer to the instructions found below for using MLflow with AML.   https://docs.microsoft.com/en-us/azure/machine-l...

  • 8 kudos
7 More Replies
sarvesh
by Contributor III
  • 3870 Views
  • 4 replies
  • 3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

  • 3870 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

  • 3 kudos
3 More Replies
sarvesh
by Contributor III
  • 26982 Views
  • 18 replies
  • 6 kudos

Resolved! java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

solution :- i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000). Before i could n't even read a 9mb file now i just read a 50mb file without any error.{ val df = spark.read .f...

edit spark ui 2 edit spark ui 1
  • 26982 Views
  • 18 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

can you try without: .set("spark.driver.memory","4g") .set("spark.executor.memory", "6g")It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).You can not also allocate 100% for ...

  • 6 kudos
17 More Replies
SailajaB
by Valued Contributor III
  • 13212 Views
  • 9 replies
  • 6 kudos

How to send a list as parameter in databricks notebook task

Hi,How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.Thank you

  • 13212 Views
  • 9 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

another another way (in databricks you can achieve everything many ways) is to encode list using json library:import json print type(json.dumps([1, 2, 3])) #>> <type 'str'>

  • 6 kudos
8 More Replies
WillJMSFT
by New Contributor III
  • 2974 Views
  • 6 replies
  • 7 kudos

Resolved! How to import SqlDWRelation from com.databricks.spark.sqldw

Hello, All - I'm working on a project using the SQL DataWarehouse connector built into Databricks (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html). From there, I'm trying to extract information from the logical plan / logi...

  • 2974 Views
  • 6 replies
  • 7 kudos
Latest Reply
WillJMSFT
New Contributor III
  • 7 kudos

@Werner Stinckens​  Thanks for the reply! The SQL DW Connector itself is working just fine and I can retrieve the results from the SQL DW. I'm trying to extract the metadata (i.e. the Server, Database, and Table name) from the logical plan (or throu...

  • 7 kudos
5 More Replies
Dileep_Vidyadar
by New Contributor III
  • 3681 Views
  • 7 replies
  • 5 kudos

Not Able to create Cluster on Community Edition for 3-4 days.

I am learning Pyspark on Community edition for a like month. It's been great until I am facing issues while creating a cluster for 3-4 Days.Sometimes it is taking 30 minutes to 60 minutes to create a Cluster and sometimes it is not even creating a Cl...

  • 3681 Views
  • 7 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Dileep Vidyadara​  - If your question was fully answered by @Hubert Dudek​, would you be happy to mark his answer as best?

  • 5 kudos
6 More Replies
All_Users
by New Contributor II
  • 1210 Views
  • 0 replies
  • 1 kudos

How do you upload a folder of csv files from your local machine into the Databricks platform?

I am working with time-series data, where each day is a separate csv file. I have tried to load a zip file to FileStore but then cannot use the magic command to unzip, most likely because it is in the tmp folder. Is there a workaround for this proble...

  • 1210 Views
  • 0 replies
  • 1 kudos
BrendanTierney
by New Contributor II
  • 3163 Views
  • 6 replies
  • 3 kudos

Resolved! Community Edition is not allocating Cluster

I've been trying to use the Community edition for the past 3 days without success.I go to run a Notebook and it begins to allocated the Cluster, but it it never finishes. Sometimes it times out after 15 minutes.Waiting for cluster to start: Finding i...

ezgif.com-gif-maker
  • 3163 Views
  • 6 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Dileep Vidyadara​ - There did seem to be a problem during the time you posted. For future reference, when you're having trouble, you can check what's going on by going to the AWS Databricks Status Page.Let us know if you have any other questions.

  • 3 kudos
5 More Replies
SepidehEb
by Contributor III
  • 3415 Views
  • 6 replies
  • 7 kudos

Resolved! How to get a minor DBR image?

In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters – currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub...

  • 3415 Views
  • 6 replies
  • 7 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 7 kudos

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-...

  • 7 kudos
5 More Replies
sunil_smile
by Contributor
  • 5464 Views
  • 5 replies
  • 6 kudos

Apart from notebook , is it possible to deploy an application (Pyspark , or R+spark) as a package or file and execute them in Databricks ?

Hi,With the help of Databricks-connect i was able to connect the cluster to my local IDE like Pycharm and Rstudio desktop version and able to develop the application and committed the code in Git.When i try to add that repo to the Databricks workspac...

image image
  • 5464 Views
  • 5 replies
  • 6 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 6 kudos

may be you will be interested our db connect . not sure if that resolve your issue to connect with 3rd party tool and setup ur supported IDE notebook serverhttps://docs.databricks.com/dev-tools/databricks-connect.html

  • 6 kudos
4 More Replies
Abela
by New Contributor III
  • 6199 Views
  • 3 replies
  • 7 kudos

Resolved! Databricks drop and remove s3 storage files safely

After dropping a delta table using DROP command in databricks, is there a way to drop the s3 files in databricks without using rm command? Looking for a solution where junior developers can safely drop a table wihout messing with the rm command where...

  • 6199 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Moderator
  • 7 kudos

Hi @Alina Bella​ ,Like @Hubert Dudek​ mentioned, we have a best practice guide for dropping managed tables. You can find the docs here

  • 7 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels