cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ninjadev999
by New Contributor II
  • 7647 Views
  • 7 replies
  • 1 kudos

Resolved! Can't write big DataFrame into MSSQL server by using jdbc driver on Azure Databricks

I'm reading a huge csv file including 39,795,158 records and writing into MSSQL server, on Azure Databricks. The Databricks(notebook) is running on a cluster node with 56 GB Memory, 16 Cores, and 12 workers.This is my code in Python and PySpark:from ...

  • 7647 Views
  • 7 replies
  • 1 kudos
Latest Reply
User16764241763
Honored Contributor
  • 1 kudos

Hi,If you are using Azure SQL DB Managed instance, could you please file a support request with Azure team? This is to review any timeouts, perf issues on the backend.Also, it seems like the timeout is coming from SQL Server which is closing the conn...

  • 1 kudos
6 More Replies
my_community2
by New Contributor III
  • 3575 Views
  • 1 replies
  • 2 kudos

Resolved! SQL cast operator not working properly

please have a look at the attached screenshotThree strings converted to float, each resulting in the same number. 22015683.000000000000000000 => 2201568422015684.000000000000000000 => 2201568422015685.000000000000000000 => 22015684

sql_cast
  • 3575 Views
  • 1 replies
  • 2 kudos
Latest Reply
MartinB
Contributor III
  • 2 kudos

Hi @Maciej G​ ,I guess, this has something to do with the data type FLOAT and its precision.Floats are only an approximation with a given precision. Either you should consider using date type DOUBLE (double precision compared to FLOAT) - or, if you ...

  • 2 kudos
jimnaik
by New Contributor III
  • 21140 Views
  • 2 replies
  • 3 kudos

Resolved! How to execute .sh and .py file in the workspace?

I want to execute shell script which is running .py file. May I know how to run .sh file and .py files in Databricks workspace?

  • 21140 Views
  • 2 replies
  • 3 kudos
Latest Reply
jimnaik
New Contributor III
  • 3 kudos

I tried executing like this and it worked: %sh /dbfs/***/***/***.sh

  • 3 kudos
1 More Replies
IvNen
by New Contributor II
  • 3477 Views
  • 1 replies
  • 1 kudos

Azure functions error communicating with a databricks notebook

I have a connection between azure functions and databricks notebook to pull data from the notebook. That was working fine until 7th of Feb, but then I started getting an error without a sensible error code. I have attached the stack trace and the err...

Capture
  • 3477 Views
  • 1 replies
  • 1 kudos
Latest Reply
IvNen
New Contributor II
  • 1 kudos

Notebook in a databricks cluster keeps a cache while the cluster is running. If you add an import statement and then remove it, the notebook still has a cached instance of that import and will continue to work. Running code in Visual Studio against t...

  • 1 kudos
Braxx
by Contributor II
  • 3398 Views
  • 2 replies
  • 3 kudos

Resolved! issue with rounding selected column in "for in" loop

This must be trivial, but I must have missed something.I have a dataframe (test1) and want to round all the columns listed in list of columns (col_list)here is the code I am running:col_list = ['measure1', 'measure2', 'measure3']   for i in col_list:...

image image
  • 3398 Views
  • 2 replies
  • 3 kudos
Latest Reply
Braxx
Contributor II
  • 3 kudos

You're absolutely right. thanks

  • 3 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 5752 Views
  • 2 replies
  • 3 kudos

Resolved! Running vacuum on each table

Hi, in line with my question about optimize, this is the next step, with a retention of 7 days I could execute vacuum on all tables once a week, is this a recommended procedure?How can I know if I'll be getting any benefit from vacuum, without DRY RU...

  • 5752 Views
  • 2 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

Ideally 7 days is recommended, but discuss with data stakeholders to identify what's suitable? 7/14/28 days. To use VACCUM, first run some analytics on behaviour of your data.Identify % of operations that perform updates and deletes vs insert operati...

  • 3 kudos
1 More Replies
ShriS1221
by New Contributor II
  • 8306 Views
  • 2 replies
  • 0 kudos

Removing new line character from spark dataframe column

I have to remove new line character from entire column of a dataframe , I tried with regex_replace but its not working.Help me on this.​

  • 8306 Views
  • 2 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please provide example of your data and the code you've tried?

  • 0 kudos
1 More Replies
NOOR_BASHASHAIK
by Contributor
  • 1906 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks PAT (personal access token) with access to databases selectively

Hi all,I am establishing a connection to databricks from Collibra through Spark driver. Collibra expects these details for the connection (for token based):personal access token (pat)server/workspace namehttpPathUpon successful connection, Collibra d...

  • 1906 Views
  • 1 replies
  • 0 kudos
Latest Reply
Atanu
Databricks Employee
  • 0 kudos

PAT token is integrated with the workspace, So it will get access of all hive. Is there anyway you can filter out with Collibra?

  • 0 kudos
jeffreym9
by New Contributor III
  • 4331 Views
  • 4 replies
  • 0 kudos

Resolved! Hive version after Upgrade Azure Databricks from 6.4 (Spark 2) to 9.1 (Spark 3)

I have upgraded the Azure Databricks from 6.4 to 9.1 which enable me to use Spark3. As far as I know, the Hive version has to be upgraded to 2.3.7 as well as discussed in: https://community.databricks.com/s/question/0D53f00001HKHy2CAH/how-to-upgrade-...

  • 4331 Views
  • 4 replies
  • 0 kudos
Latest Reply
jeffreym9
New Contributor III
  • 0 kudos

I'm asking about Datatricks version 9.1. I've follow the url given (https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore). Do you mind letting me know where in the table is mentioning the supported hive version fo...

  • 0 kudos
3 More Replies
thushar
by Contributor
  • 4903 Views
  • 8 replies
  • 6 kudos

Resolved! Compile all the scripts under the workspace folder

In workspace one folder I have around 100+ pyspark scripts, all these scripts need to be compiled before running the main program. In order to compile all these files, we are using the %run magic command like %run ../prod/netSales. Since we have 100+...

  • 4903 Views
  • 8 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Problem is that you can list all files in workspace only via API call and than you can run every one of them using:dbutils.notebook.run()This is the script to list files from workspace (probably you need to add some filterning):import requests ctx = ...

  • 6 kudos
7 More Replies
thushar
by Contributor
  • 4621 Views
  • 4 replies
  • 3 kudos

Resolved! Deploy tar.gz package from private git hub

We created Python package (.tar.gz) and kept it under private git.We can able to connect to that git (using PAT) from the Azure databricks notebook.Our requirement is to install that package from .tar.gz file for that notebook"pip install https://USE...

  • 4621 Views
  • 4 replies
  • 3 kudos
Latest Reply
Rahul_Samant
Contributor
  • 3 kudos

For installing the package using pip you need to package the repo using setup.py. check this link for more details https://packaging.python.org/en/latest/tutorials/packaging-projects/alternatively you can pass the tar.gz using --py-files while submi...

  • 3 kudos
3 More Replies
soy_wax_melts
by New Contributor
  • 592 Views
  • 0 replies
  • 0 kudos

www.scent-sational-waxmelts.co.uk

Ignite your senses with distinctive and soy wax melts delightful fragrances for your home, Discover scents to set the mood and inspire fragrant memories

  • 592 Views
  • 0 replies
  • 0 kudos
Vibhor
by Contributor
  • 4171 Views
  • 5 replies
  • 1 kudos

Resolved! Notebook level automated pipeline monitoring or failure notif

Hi, is there any way other than adf monitoring where in automated way we can get notebook level execution details without getting to go to each pipeline and checking

  • 4171 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Vibhor Sethi​ - Would you be happy to mark @Werner Stinckens​' answer as best if it resolved your question?

  • 1 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels