Data Engineering

Forum Posts

Sorted by:

by seboz123 • New Contributor II

03-16-2023 2:54:40 AM

1296 Views
3 replies
0 kudos

Display Html from dbfs files

Hi,I want to display some content from dbfs inside my notebook. Let's say I have a image under: /dbfs/mnt/test-bucket/test-custom/sample.pngI want to embed that into my Notebook Html Output like this:displayHTML("""<img src ='/dbfs/mnt/test-bucket/te...

Data Engineering

1296 Views
3 replies
0 kudos

03-16-2023 2:54:40 AM

View Replies

Latest Reply

seboz123
New Contributor II

04-03-2023 12:40:12 AM

0 kudos

Hi @Vidula Khanna ,unfortunately not. I can access the file through the notebook via e.g. !ls /dbfs/mnt/test-bucket/test-custom/ but it cannot be displayed via the displayHTML, with the 401.

0 kudos

04-03-2023 12:40:12 AM

2 More Replies

by kll • New Contributor III

04-02-2023 9:33:06 PM

2814 Views
2 replies
0 kudos

Access DataBricks file system and transfer files

Is there a solution to access the files in DataBricks file system and transfer them to another directory, local or elsewhere. In other words, is there a FileZilla type solution? where can i find instructions to ssh into the cluster from my mac termin...

Data Engineering

2814 Views
2 replies
0 kudos

04-02-2023 9:33:06 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

04-03-2023 12:21:04 AM

0 kudos

Hi, There are different options around this. You can start with https://docs.databricks.com/files/index.html. Please let us know if this helps. Also, please tag @Debayan with your next response which will notify me. Thank you!

0 kudos

04-03-2023 12:21:04 AM

1 More Replies

by sanjay • Valued Contributor II

03-30-2023 12:42:50 AM

4319 Views
12 replies
10 kudos

Spark tasks too slow and not doing parellel processing

Hi,I have spark job which is processing large data set, its taking too long to process the data. In Spark UI, I can see its running 1 tasks out of 9 tasks. Not sure how to run this in parellel. I have already mentioned auto scaling and providing upto...

Data Engineering

4319 Views
12 replies
10 kudos

03-30-2023 12:42:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:12:40 PM

10 kudos

Hi @Sanjay Jain Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

10 kudos

03-31-2023 7:12:40 PM

11 More Replies

by Sorush • New Contributor II

03-21-2023 12:46:01 PM

1627 Views
3 replies
1 kudos

Issue with VS Code extension repo.

I successfully installed the extension and connected it to my databricks account. But when I try to select the repo (which already exists under repos in my databricks account) for syncing , I don't see it. My company uses Azure Devops (Git repo) as s...

Data Engineering

1627 Views
3 replies
1 kudos

03-21-2023 12:46:01 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

04-02-2023 10:15:45 PM

1 kudos

Hi, was the repo already added?You can check on this : https://learn.microsoft.com/en-us/azure/databricks/repos/git-operations-with-reposAlso, please tag @Debayan Mukherjee with your next response which will notify me. Thank you!

1 kudos

04-02-2023 10:15:45 PM

2 More Replies

by alejandrofm • Valued Contributor

03-30-2023 2:01:09 PM

1612 Views
2 replies
1 kudos

Understand if the configs I use to SparkSession.builder still make sense for Databricks 10+

Hi! I currently have this as an old generic template with amends over time to optimize Databricks Spark execution, can you help me to know if this still makes sense for v10-11-12 or if there are new recommendations? Maybe some of this is making my pr...

Data Engineering

1612 Views
2 replies
1 kudos

03-30-2023 2:01:09 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:35:11 AM

1 kudos

@Alejandro Martinez :Hi! Your template seems to be a good starting point for configuring a SparkSession in Databricks. However, there are some new recommendations that you can consider for Databricks runtime versions v10-11-12. Here are some suggest...

1 kudos

04-02-2023 9:35:11 AM

1 More Replies

by Gilg • Contributor II

03-30-2023 11:24:03 PM

1993 Views
1 replies
0 kudos

Adding column as StructType

Hi Team,Just wondering, how can I add a column to an existing table.I'd tried the below script but giving me an error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'(line 1, pos 121)ALTER TABLE table_clone ADD COLUMNS col_name1 STRUC...

Data Engineering

1993 Views
1 replies
0 kudos

03-30-2023 11:24:03 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:19:07 AM

0 kudos

@Gil Gonong :In Databricks, you can add a column to an existing table using the ALTER TABLE statement in SQL. Here is an example:ALTER TABLE table_clone ADD COLUMN col_name1 STRUCT< type: STRING, values: ARRAY<STRING> >Note that you need to ...

0 kudos

04-02-2023 9:19:07 AM

by Retko • Contributor

03-31-2023 5:17:53 AM

3435 Views
1 replies
1 kudos

Error when using SAS token to connect to azure Storage Account: Unable to load SAS token provider class: java.lang.IllegalArgumentException

Hi, I am trying to connect to the Storage Account using the SAS token, and receive this error: Unable to load SAS token provider class: java.lang.IllegalArgumentException - more on the picture.I couldnt find anything on the web for this error.I also ...

Data Engineering

3435 Views
1 replies
1 kudos

03-31-2023 5:17:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:13:26 AM

1 kudos

@Retko Okter :It seems that there is an issue with the SAS token provider class. This error can occur when the SAS token is not correctly formatted or is invalid.Here are some steps you can try to resolve the issue:Verify that the SAS token is corre...

1 kudos

04-02-2023 9:13:26 AM

by kll • New Contributor III

03-31-2023 12:07:13 AM

2002 Views
1 replies
1 kudos

plotly express choropleth map not rendering in jupyter notebook

I have the following code which should render a choropleth map. import plotly.express as px import geopandas as gpd # Example GeoJSON file with polygon geometries geojson_file = 'example.geojson' # Read GeoJSON file into GeoDataFrame *** = gpd.re...

Data Engineering

2002 Views
1 replies
1 kudos

03-31-2023 12:07:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-02-2023 9:00:21 AM

1 kudos

@Keval Shah :There could be several reasons why the choropleth map is not rendering in your Jupyter notebook. Here are a few things you could try:Check that the GeoJSON file is loaded correctly: Make sure that the GeoDataFrame has been loaded correc...

1 kudos

04-02-2023 9:00:21 AM

by 624398 • New Contributor III

03-06-2023 7:17:17 AM

1907 Views
7 replies
0 kudos

is there a read only option in jdbc driver?

Is there a "read only" option when using databricks sql using jdbc driver?I'm looking for an equivalent to this:https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-configuration-options.html#jdbc20-readonly-optionThanks!

Data Engineering

1907 Views
7 replies
0 kudos

03-06-2023 7:17:17 AM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:57:14 AM

0 kudos

Hi @Nativ Issac Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

0 kudos

03-31-2023 2:57:14 AM

6 More Replies

by shelly • New Contributor

03-28-2023 9:01:03 PM

738 Views
2 replies
0 kudos

take() ooperation is throwing error

Traceback (most recent call last): File "/usr/local/spark/python/pyspark/serializers.py", line 458, in dumps return cloudpickle.dumps(obj, pickle_protocol) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/spark/python/pyspa...

Data Engineering

738 Views
2 replies
0 kudos

03-28-2023 9:01:03 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-01-2023 10:43:36 PM

0 kudos

@Shelly Bhardwaj :The error message you provided seems to be incomplete, as it only shows the traceback of a serialization error. Can you provide the full error message or describe the issue in more detail?Regarding the code you provided, it looks c...

0 kudos

04-01-2023 10:43:36 PM

1 More Replies

by KayCon86 • New Contributor

03-28-2023 8:11:38 AM

1316 Views
3 replies
0 kudos

Creating a Api links by url & list from a saved df

I have 106,000 + api's I need to call, so instead of calling them one by one I would like to create a loop as I have the list of location Id's which I've called from there api locations list and these will sit at the end of the url to get more info o...

Data Engineering

1316 Views
3 replies
0 kudos

03-28-2023 8:11:38 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-01-2023 10:33:26 PM

0 kudos

@Kay Connolly :It looks like you are trying to concatenate a string with a column object, which is causing the error. You need to convert the column object to a string first before concatenating it to the URL. Here's a modified code snippet that sho...

0 kudos

04-01-2023 10:33:26 PM

2 More Replies

by Anonymous • Not applicable

03-21-2023 3:07:25 PM

880 Views
2 replies
2 kudos

Pyspark insertinto on Hive external table doesn't work if overwrite is true

Getting an Hive Exception

Data Engineering

880 Views
2 replies
2 kudos

03-21-2023 3:07:25 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-01-2023 10:23:57 PM

2 kudos

@ppatel:If you are using insertInto with overwrite=True on a Hive external table in PySpark, it might not work as expected. This is because Hive external tables are not managed by Hive and the table data is stored externally. When you use overwrite=T...

2 kudos

04-01-2023 10:23:57 PM

1 More Replies

by maartenvr • New Contributor III

02-28-2023 5:06:06 AM

12847 Views
9 replies
2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

Data Engineering

12847 Views
9 replies
2 kudos

02-28-2023 5:06:06 AM

View Replies

Latest Reply

maartenvr
New Contributor III

03-13-2023 2:52:53 AM

2 kudos

No solution yet:Hi @Suteja Kanuri ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

2 kudos

03-13-2023 2:52:53 AM

8 More Replies

by Jyo777 • Contributor

03-10-2023 9:03:59 AM

2549 Views
4 replies
4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

Data Engineering

2549 Views
4 replies
4 kudos

03-10-2023 9:03:59 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2023 3:14:04 AM

4 kudos

Hi @Jyoti j, We haven't heard from you since the last response from @Suteja Kanuri, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others....

4 kudos

03-18-2023 3:14:04 AM

3 More Replies

by darthdickhead • New Contributor III

03-11-2023 8:29:13 PM

4719 Views
5 replies
3 kudos

Best way to install and manage a private Python package that has a continuously updating Wheel

I'm trying to setup a Workspace Library that is used internally within our organization. This is a Python package, where the source is available on a private GitHub repository, and not accessible on PyPi or the wider internet / surface web. I managed...

Data Engineering

4719 Views
5 replies
3 kudos

03-11-2023 8:29:13 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-18-2023 8:33:21 AM

3 kudos

Hi @Eshwaran Venkat , We haven't heard from you since the last response from @Suteja Kanuri , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpfu...

3 kudos

03-18-2023 8:33:21 AM

4 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Display Html from dbfs files

Access DataBricks file system and transfer files

Spark tasks too slow and not doing parellel processing

Issue with VS Code extension repo.

Understand if the configs I use to SparkSession.builder still make sense for Databricks 10+

Adding column as StructType

Error when using SAS token to connect to azure Storage Account: Unable to load SAS token provider class: java.lang.IllegalArgumentException

plotly express choropleth map not rendering in jupyter notebook

is there a read only option in jdbc driver?

take() ooperation is throwing error

Creating a Api links by url & list from a saved df

Pyspark insertinto on Hive external table doesn't work if overwrite is true

Resolved! Unable to clear cache using a pyspark session

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Best way to install and manage a private Python package that has a continuously updating Wheel

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...