cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

IkramMecheri
by New Contributor II
  • 10542 Views
  • 5 replies
  • 2 kudos

ImportError: No module named 'bs4'

Hi, I would like to do some web scrapping, however I am unable to import the libraries I traditionally use for that task import requests from bs4 import BeautifulSoup

  • 10542 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Ikram Mecheri​ â€‹ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 2 kudos
4 More Replies
Constantine
by Contributor III
  • 1855 Views
  • 2 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 1855 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did you try the above suggestions?

  • 5 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 3233 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 3233 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
saniafatimi
by New Contributor II
  • 2725 Views
  • 2 replies
  • 1 kudos

How to migrate power bi reports to databricks

I have a sample set of power bi(.pbix) reports with all dropdowns, tables, filters etc. Now I would like to migrate this reports to data bricks. whatever visuals are created in power bi, I would like to create same in data bricks from scratch. I wou...

  • 2725 Views
  • 2 replies
  • 1 kudos
Latest Reply
Neeljy
New Contributor II
  • 1 kudos

We must ensure you are sure that the Databricks cluster is operational. These are the steps needed for integration between Azure Databricks into Power BI Desktop.1. Constructing the URL for the connectionConnect to the cluster, and click the Advanced...

  • 1 kudos
1 More Replies
Constantine
by Contributor III
  • 1958 Views
  • 1 replies
  • 4 kudos

Resolved! How to process a large delta table with UDF ?

I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data): return pass   udf_func = udf(my_udf, StringType()) data...

  • 1958 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html

  • 4 kudos
george2020
by New Contributor II
  • 934 Views
  • 0 replies
  • 2 kudos

Using the Databricks Repos API to bring Repo in top-level production folder to latest version

I am having an issue with Github Actions workflow using the Databricks Repos API. We want the API call in the Git Action to bring the Repo in our Databricks Repos Top-level folder to the latest version on a merge into the main branch.The Github Actio...

  • 934 Views
  • 0 replies
  • 2 kudos
wyzer
by Contributor II
  • 3035 Views
  • 2 replies
  • 2 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

  • 3035 Views
  • 2 replies
  • 2 kudos
Latest Reply
wyzer
Contributor II
  • 2 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

  • 2 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 2315 Views
  • 4 replies
  • 5 kudos

Resolved! Show Vacuum operation result (files deleted) without DRY RUN

Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?Thanks!

  • 2315 Views
  • 4 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

We have to enable logging to capture the logs for vacuum.spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")

  • 5 kudos
3 More Replies
Constantine
by Contributor III
  • 3368 Views
  • 3 replies
  • 2 kudos

Resolved! OPTIMIZE throws an error after doing MERGE on the table

I have a table on which I do upsert i.e. MERGE INTO table_name ...After which I run OPTIMIZE table_nameWhich throws an errorjava.util.concurrent.ExecutionException: io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read...

  • 3368 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can try to change isolation level:https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/isolation-levelIn merge is good to specify all partitions in merge conditions.It can also happen when script is running concurrently.

  • 2 kudos
2 More Replies
wyzer
by Contributor II
  • 2880 Views
  • 2 replies
  • 4 kudos

Resolved! How to show the properties of the folders/files from DBFS ?

Hello,How to show the properties of the folders/files from DBFS ?Currently i am using this command :display(dbutils.fs.ls("dbfs:/"))But it only shows :pathnamesizeHow to show these properties ? : CreatedBy (Name)CreatedOn (Date)ModifiedBy (Name)Modi...

  • 2880 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Only one idea is to use %sh magic command but there is no name (just root)

  • 4 kudos
1 More Replies
SettlerOfCatan
by New Contributor
  • 2118 Views
  • 0 replies
  • 0 kudos

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service, like Databricks.We want to read and work with these files with a computing resource obtained by Azure directly without d...

blob-storage Azure-ML fileytypes blob
  • 2118 Views
  • 0 replies
  • 0 kudos
MattM
by New Contributor III
  • 2942 Views
  • 5 replies
  • 5 kudos

Resolved! Schema Parsing issue when datatype of source field is mapped incorrect

I have complex json file which has massive struct column. We regularly have issues when we try to parse this json file by forming our case class to extract the fields from schema. With this approach the issue we are facing is that if one data type of...

  • 2942 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hey there, @Matt M​ - If @Hubert Dudek​'s response solved the issue, would you be happy to mark his answer as best? It helps other members find the solution more quickly.

  • 5 kudos
4 More Replies
RiyazAli
by Valued Contributor
  • 4172 Views
  • 4 replies
  • 3 kudos

Resolved! Where does the files downloaded from wget get stored in Databricks?

Hey Team!All I'm trying is to download a csv file stored on S3 and read it using Spark.Here's what I mean:!wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csvIf i download this "yellow_tripdata_2020-01.csv" where exactly it wo...

  • 4172 Views
  • 4 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 3 kudos

Hi @Kaniz Fatma​ , thanks for the remainder.Hey @Hubert Dudek​ - thank you very much for your prompt response.Initially, I was using urllib3 to 'GET' the data residing in the URL. So, I wanted an alternative for the same. Unfortunately, requests libr...

  • 3 kudos
3 More Replies
chaitanya
by New Contributor II
  • 3108 Views
  • 3 replies
  • 4 kudos

Resolved! While loading Data from blob to delta lake facing below issue

I'm calling the stored proc then store into pandas dataframe then creating list while creating list getting below error Databricks execution failed with error state Terminated. For more details please check the run page url: path An error occurred w...

  • 3108 Views
  • 3 replies
  • 4 kudos
Latest Reply
shan_chandra
Esteemed Contributor III
  • 4 kudos

@chaitanya​ , could you please try disabling arrow optimization and see if this resolves the issue?spark.sql.execution.arrow.enabled falsespark.sql.execution.arrow.pyspark.enabled false

  • 4 kudos
2 More Replies
steven_vcnt
by New Contributor III
  • 2389 Views
  • 3 replies
  • 4 kudos

Resolved! ADLSGen2: Cannot display Delta table summary from Data>Table tab on Databricks

Hello, I have set up my account storage on Azure with an ADLSGen2 and I have succeeded to save the delta table on my ADLSGen2, from there I have created my delta table on Databricks.From there I am unable to display the summary of my delta table unde...

Errorimage
  • 2389 Views
  • 3 replies
  • 4 kudos
Latest Reply
steven_vcnt
New Contributor III
  • 4 kudos

Hello,Following Hubert comment, in order to create a delta table on Databricks from Azure, I had to use CLONE argument in order to copy the data plus the metadata of my delta table on Azure. In order to set up the connection between Databricks and A...

  • 4 kudos
2 More Replies
Labels